Semantic Classes and Syntactic Ambiguity
نویسنده
چکیده
In this paper we propose to define selectional preference and semantic similarity as information-theoretic relationships involving conceptual classes, and we demonstrate the applicability of these definitions to the resolution of syntactic ambiguity. The space of classes is defined using WordNet [8], and conceptual relationships are determined by means of statistical analysis using parsed text in the Penn Treebank. 1. I N T R O D U C T I O N The problem of syntactic ambiguity is a pervasive one. As Church and Patil [2] point out, the class of"every way ambiguous" constructions those for which the number of analyses is the number of binary trees over the terminal elements includes such frequent constructions as prepositional phrases, coordination, and nominal compounds. They suggest that until it has more useful constraints for resolving ambiguities, a parser can do little better than to efficiently record all the possible attachments and move on. In general, it may be that such constraints can only be supplied by analysis of the context, domain-dependent knowledge, or other complex inferential processes. However, we will suggest that in many cases, syntactic ambiguity can be resolved with the help of an extremely limited form of semantic knowledge, closely tied to the lexical items in the sentence. We focus on two relationships: selectional preference and semantic similarity. From one perspective, the proposals here can be viewed as an attempt to provide new formalizations for familiar but seldom carefully defined linguistic notions; elsewhere we demonstrate the utility of this approach in linguistic explanation [11]. From another perspective, the work reported here can be viewed as an attempt to generalize statistical natural language techniques based on lexical associations, using knowledge-based rather than distributionally derived word classes. * This research has been supported by an IBM graduate fellowship and by DARPA grant N00014-90-J-1863. The comments of Eric Bnll, Marti Hearst, Jarnie Henderson, Aravind Joshi, Mark Liberman, Mitch Marcus, Michael Niv, and David Yarowsky are gratefully acknowledged. 2. C L A S S B A S E D S T A T I S T I C S A number of researchers have explored using lexical cooccurrences in text corpora to induce word classes [ 1,5, 9, 12], with results that are generally evaluated by inspecting the semantic cohesiveness of the distributional classes that result. In this work, we are investigating the alternative of using WordNet, an explicitly semantic, broad coverage lexical database, to define the space of semantic classes. Although WordNet is subject to the attendant disadvantages of any handconstructed knowledge base, we have found that it provides an acceptable foundation upon which to build corpus-based techniques [10]. This affords us a clear distinction between domain-independent and corpus-specific sources of information, and a well-understood taxonomic representation for the domain-independent knowledge. Although WordNet includes data for several parts of speech, and encodes numerous semantic relationships (meronymy, antonymy, verb entailment, etc.), in this work we use only the noun taxonomy specifically, the mapping from words to word classes, and the traditional IS-A relationship between classes. For example, the word newspaper belongs to the classes (newsprint) and (paper), among others, and these are immediate subclasses of (material) and (publisher), respectively, t Class frequencies are estimated on the basis of lexical frequencies in text corpora. The frequency of a class c is estimated using the lexical frequencies of its members, as follows: freq(c) = Z freq(n) (1) {nln is subsumed by c) The class probabilities used in the section that follows can then be estimated by simply normalizing (MLE) or by other methods such as Good-Turing [3]. 2 1For expository convenience we identify WordNet noun classes using a single descriptive word in angle brackets. However, the internal representation assigns each class a unique identifier. 2We use Good-Tudng. Note, however, that WordNet classes are not necessarily disjoint; space limitations preclude further discussion of this complication here.
منابع مشابه
Semantic Priming Effect on Relative Clause Attachment Ambiguity Resolution in L2
This study examined whether processing ambiguous sentences containing relative clauses (RCs) following a complex determiner phrase (DP) by Persian-speaking learners of L2 English with different proficiency and working memory capacities (WMCs) is affected by semantic priming. The semantic relationship studied was one between the subject/verb of the main clause and one of the DPs in the complex D...
متن کاملرشد جنبه معنایی فعل در کودک فارسیزبان: مطالعه طولی
Objective Learning “verb” as one of the main components of sentence, has been always a debatable topics in the process of language learning. One of the important issues in “verb” learning is determining its meaning using syntactic clues and learning its semantic aspects. Therefore, the main objective of this study was to examine the development of the semantic aspect of ...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملThe functional organisation of the fronto-temporal language system: evidence from syntactic and semantic ambiguity.
Spoken language comprehension is known to involve a large left-dominant network of fronto-temporal brain regions, but there is still little consensus about how the syntactic and semantic aspects of language are processed within this network. In an fMRI study, volunteers heard spoken sentences that contained either syntactic or semantic ambiguities as well as carefully matched low-ambiguity sent...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993